Model Selection

FP8 Quantized Inference

# FP8 Quantized Inference

Qwen3-32B-FP8 is the latest 32.8B-parameter large language model in the Qwen series, supporting switching between thinking and non-thinking modes with exceptional reasoning, instruction following, and agent capabilities.

Large Language Model

Qwen3-8B-FP8 is the latest version in the Qwen series of large language models, offering FP8 quantization, seamless switching between thinking and non-thinking modes, and powerful reasoning capabilities with multilingual support.

Large Language Model

Qwen2.5 VL 72B Instruct FP8 Dynamic

FP8 quantized version of Qwen2.5-VL-72B-Instruct, supporting vision-text input and text output, optimized and released by Neural Magic.

Transformers English

Llama 3.1 8B Instruct FP8

FP8 quantized version of Meta Llama 3.1 8B Instruct model, featuring an optimized transformer architecture autoregressive language model with 128K context length support.

Large Language Model

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase